485 research outputs found

    Minimization and estimation of the variance of prediction errors for cross-validation designs

    Get PDF
    We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data's probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice

    A U-statistic estimator for the variance of resampling-based error estimators

    Get PDF
    We revisit resampling procedures for error estimation in binary classification in terms of U-statistics. In particular, we exploit the fact that the error rate estimator involving all learning-testing splits is a U-statistic. Therefore, several standard theorems on properties of U-statistics apply. In particular, it has minimal variance among all unbiased estimators and is asymptotically normally distributed. Moreover, there is an unbiased estimator for this minimal variance if the total sample size is at least the double learning set size plus two. In this case, we exhibit such an estimator which is another U-statistic. It enjoys, again, various optimality properties and yields an asymptotically exact hypothesis test of the equality of error rates when two learning algorithms are compared. Our statements apply to any deterministic learning algorithms under weak non-degeneracy assumptions. In an application to tuning parameter choice in lasso regression on a gene expression data set, the test does not reject the null hypothesis of equal rates between two different parameters

    Minimization and estimation of the variance of prediction errors for cross-validation designs

    Get PDF
    We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data's probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice

    Borderline

    Get PDF
    When choosing from the over 1,000 different proposals we received for the Media Façade of the Museum of Contemporary Art, Zagreb (MSU), my attention was captured by the video material that Mathias Fuchs had sent us. The sharp colors that characterized the video, the lights blinking with explosions, the sharp movements of the ”˜warriors’ that were coordinated and choreographed but at the same time aggressive and war like, all of these elements contributed to create a feeling in the viewer of being trapped between a choreographed dance and an imminent transformation of the characters into a real menace ready to pop out of the screen

    Inequality Trends for Germany in the Last Two Decades: A Tale of Two Countries

    Get PDF
    In this paper we first document inequality trends in wages, hours worked, earnings, consumption, and wealth for Germany from the last twenty years. We generally find that inequality was relatively stable in West Germany until the German unification (which happened politically in 1990 and in our data in 1991), and then trended upwards for wages and market incomes, especially after about 1998. Disposable income and consumption, on the other hand, display only a modest increase in inequality over the same period. These trends occured against the backdrop of lower trend growth of earnings, incomes and consumption in the 1990s relative to the 1980s. In the second part of the paper we further analyze the differences between East and West Germans in terms of the evolution of levels and inequality of wages, income, and consumption.

    A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

    Get PDF
    The mean prediction error of a classification or regression procedure can be estimated using resampling designs such as the cross-validation design. We decompose the variance of such an estimator associated with an arbitrary resampling procedure into a small linear combination of covariances between elementary estimators, each of which is a regular parameter as described in the theory of UU-statistics. The enumerative combinatorics of the occurrence frequencies of these covariances govern the linear combination's coefficients and, therefore, the variance's large scale behavior. We study the variance of incomplete U-statistics associated with kernels which are partly but not entirely symmetric. This leads to asymptotic statements for the prediction error's estimator, under general non-empirical conditions on the resampling design. In particular, we show that the resampling based estimator of the average prediction error is asymptotically normally distributed under a general and easily verifiable condition. Likewise, we give a sufficient criterion for consistency. We thus develop a new approach to understanding small-variance designs as they have recently appeared in the literature. We exhibit the UU-statistics which estimate these variances. We present a case from linear regression where the covariances between the elementary estimators can be computed analytically. We illustrate our theory by computing estimators of the studied quantities in an artificial data example
    corecore